**COMP1411 (Spring 2023) Introduction to Computer Systems**

**(UPDATED on 22PM, 10-Mar-2023)**

Individual Assignment 2 Duration: 12:00, 10-Mar-2023 ~ 23:59, 12-Mar-2023

|  |  |
| --- | --- |
| *Name* | **VENKATESAN Jyotsna** |
| *Student number* | **22108825D** |

**Question 1**. [3 marks]

In this question, we use the Y86-64 instruction set (please refer to Lectures 4-6).

**1(a)** [1 mark]

**Write** the machine code encoding of the assembly instruction:

mrmovq 0x1356(%rbp), %rdi

Please write the bytes of the machine code in hex-decimal form, i.e., using two hex-decimal digits to represent one byte. You are allowed to leave spaces between adjacent bytes for better readability. The machine has a little-endian byte ordering.

**Show your steps. Only giving the final result will NOT get a full mark of this question.**

***Answer*:**

OP code of mrmovq = 50

Second bit = 75

0x1356 displacement is 56 13 00 00 00 00 00 00

Thus, the answer is 50 75 56 13 00 00 00 00 00 00

**1(b)** [2 marks]

Consider the execution of the instruction “mrmovq 0x1356(%rbp), %rdi”. Assume that for now, the data in register %rbp is 0x334 just before executing this instruction, the value of PC is 0x540. We use “**vm**” to represent the data read from the main memory.

**Describe** the steps done in the following stages: Fetch, Decode, Execute, Memory, Write Back, PC update, by filling in the blanks in the table below.

Note that you are required to fill in the generic form of each step in the second column; and in the third column, fill in the steps for the instruction “mrmovq 0x1356(%rbp), %rdi” with the above given values. If you think there should not be a step in some stage, just leave the blanks unfilled.

The symbol “←” means reading something from the right side and assign the value to the left side. X:Y means assign the highest 4 bits of a byte to X, and assign the lowest 4 bits of the byte to Y.

***Answer*:**

|  |  |  |
| --- | --- | --- |
| **Stages** | **mrmovq D(rB), rA** | **mrmovq 0x1356(%rbp), %rdi** |
| Fetch | icode: ifun ← M1[PC]  rA:rB ← M1[PC+1]  valC ← M8[PC+1]  valP ← PC+10 | icode: ifun ← M1[0x540] = 5:0  rA:rB ← M1[0x541] = 7:5  valC ← M1[0x542] = 0x1356  valP ← 0x540+10 = 0x54A |
| Decode | valB ← R[rB] | valB ← R[%rbp] = 0x334 |
| Execute | valE ← valB + valC | valE ← 0x334 + 0x1356 = 0x168A |
| Memory | valM ← M8[valE] | valM ← M8[0x168A] |
| Write back | R[ rA ] ← valM | R[ %rdi ] ← vm |
| PC update | PC ← valP | PC ← 0x54A |

**Question 2**. [3 marks]

Suppose a combinational logic is implemented by 6 serially connected components named from A to F. The whole computation logic can be viewed as an instruction. The number on each component is the time delay spent on this component, in time unit ps, where 1ps = 10-12 second. Operating each register will take 20ps.

A

B

C

D

E

F

30ps

65ps

50ps

100ps

30ps

80ps

Throughput is defined as how many instructions can be executed on average in one second for a pipeline in the long run, and the unit of throughput is IPS, instructions per second.

Latency refers to the time duration starting from the very first component and ending with the last register operation finished, the time unit for latency is ps.

For throughput, please write the result in the form X.XX \* 10Y IPS, where X.XX means one digit before the dot and two fractional digits after the dot, and Y is the exponent.

**2(a)** Make the computation logic a 3-stage pipeline design that has the maximal throughput. Note that a register shall be inserted after each stage to separate their combinational logics. By default, a register will be inserted after the last stage, i.e., after step F. [1.5 marks]

* Please answer how to partition the stages.
* Please compute the throughput and latency for your pipeline design, with steps.

ABC: 30 + 65 + 50 = 145

D : 100

EF: 30 + 80 = 110

Throughput: 1/((145+20)\* 10-12) = 6.06 \* 10-9 IPS

Latency: 165 \* 3 = 495

**2(b)** Make the computation logic a 4-stage pipeline design that has the maximal throughput. Note that a register shall be inserted after each stage to separate their combinational logics. By default, a register will be inserted after the last stage, i.e., after step F. [1.5 marks]

* Please answer how to partition the stages.
* Please compute the throughput and latency for your pipeline design, with steps.

AB: 30 + 65 = 90

C: 50

D: 100

EF: 30 + 80 = 110

Throughput: 1/((110+20)\* 10-12) = 7.69 \* 10-9 IPS

Latency: 130 \* 4 = 520 ps

**Question 3**. [4 marks]

The following byte sequence is the machine code of a function compiled with the Y86-64 instruction set (refer to Lecture 6). The memory address of the first byte is 0x1500. Note that the byte sequence is written in hex-decimal form, i.e., each number/letter is one hex-decimal number representing 4 binary bits, and two numbers/letters represent one byte. **Assume the machine is a little-endian byte order machine.** Assume that by default the value in register %rax will be returned.

**30F33200000000000000****30F10000000000000000****30F00100000000000000****702B15000000000000****6031****6103****6233****762715000000000000****2010****90**

Please write out the assembly instructions (in Y86-64 instruction set) corresponding to the machine codes given by the above bytes sequence, and explain what this function is computing.

**30F33200000000000000:**

0X1500: irmovq $50 %rbx

**30F10000000000000000:**

0x150A: irmovq $0 %rcx

**30F00100000000000000:**

0x1514: irmovq $1 %rax

**702B15000000000000:**

0x151E: jmp L2

**L1:**

**6031:**

0x1527: addq %rbx, %rcx

**6103:**

0x1529: subq %rax, %rcx

**L2:**

**6233:**

0x152B: andq %rbx, %rbx

**762715000000000000:**

0x152D: ig L1

**2010:**

0X1536: rrmovq %rcx %rax

**90:**

0X1538: ret

Thus, the program computes 50 + 49 + 48 + 47 +…+ 3 + 2 + 1